3 research outputs found
Causal-aware Safe Policy Improvement for Task-oriented dialogue
The recent success of reinforcement learning's (RL) in solving complex tasks
is most often attributed to its capacity to explore and exploit an environment
where it has been trained. Sample efficiency is usually not an issue since
cheap simulators are available to sample data on-policy. On the other hand,
task oriented dialogues are usually learnt from offline data collected using
human demonstrations. Collecting diverse demonstrations and annotating them is
expensive. Unfortunately, use of RL methods trained on off-policy data are
prone to issues of bias and generalization, which are further exacerbated by
stochasticity in human response and non-markovian belief state of a dialogue
management system. To this end, we propose a batch RL framework for task
oriented dialogue policy learning: causal aware safe policy improvement
(CASPI). This method gives guarantees on dialogue policy's performance and also
learns to shape rewards according to intentions behind human responses, rather
than just mimicking demonstration data; this couple with batch-RL helps overall
with sample efficiency of the framework. We demonstrate the effectiveness of
this framework on a dialogue-context-to-text Generation and end-to-end dialogue
task of the Multiwoz2.0 dataset. The proposed method outperforms the current
state of the art on these metrics, in both case. In the end-to-end case, our
method trained only on 10\% of the data was able to out perform current state
in three out of four evaluation metrics
GAEA: Graph Augmentation for Equitable Access via Reinforcement Learning
Disparate access to resources by different subpopulations is a prevalent
issue in societal and sociotechnical networks. For example, urban
infrastructure networks may enable certain racial groups to more easily access
resources such as high-quality schools, grocery stores, and polling places.
Similarly, social networks within universities and organizations may enable
certain groups to more easily access people with valuable information or
influence. Here we introduce a new class of problems, Graph Augmentation for
Equitable Access (GAEA), to enhance equity in networked systems by editing
graph edges under budget constraints. We prove such problems are NP-hard, and
cannot be approximated within a factor of . We develop a
principled, sample- and time- efficient Markov Reward Process (MRP)-based
mechanism design framework for GAEA. Our algorithm outperforms baselines on a
diverse set of synthetic graphs. We further demonstrate the method on
real-world networks, by merging public census, school, and transportation
datasets for the city of Chicago and applying our algorithm to find
human-interpretable edits to the bus network that enhance equitable access to
high-quality schools across racial groups. Further experiments on Facebook
networks of universities yield sets of new social connections that would
increase equitable access to certain attributed nodes across gender groups